We are concerned with the problem of finding minima (maxima) of a functional , where is a subset of a (normed) linear space of real-valued (or real-vector-valued) functions. The formulation of a problem of the calculus of variations requires two steps: the specification of a performance criterion is discussed and the statement of physical constraints that should be satisfied.
Remark 3.1: Performance Criterions
A performance criterion , also called cost functional or simply cost must be specified for evaluating the performance of a system quantitatively. The typical form of (also called Lagrange problem of the calculus of variations) is
where is the real or independent variable, usually called time, , , is a real vector variable, usually called the phase variable. The functions are generally called trajectories or curves; stands for the derivative of with respect to time; and is a real-valued function, called a Lagrangian function or, briefly, a Lagrangian a , the function measures the instantaneous cost and is sometimes called running cost. Overall, we may thus think of as dependent on an real-vector-valued continuous function . A slightly different performance criterion is:
an additional function weights the endpoints of the phase trajectory and . This problem is called Bolza problem. In the same way one can define a third type of cost functional known as Mayer problem b :
It can be shown that all these formulations are equivalent that means that can be transformed into one another via a variable transformation.
Definition 3.1: Calculus of Variation Basic Problem
The basic problem of Calculus of Variation is defined as:
For example we can take as the space of real continuously differentiable functions on and as a subset of this function space that is the space of continuously differentiable functions that is equal to a fixed at . Enforcing constraints in the optimization problem reduces the set of candidate functions, i.e., not all functions in are allowed. This leads to the following:
Definition 3.2: Admissible Trajectory
A trajectory a a Again we emphasize the fact that a trajectory is a real or vector valued function, we use both the notation or in a real linear function space is said to be an admissible trajectory provided that it satisfies all the physical constraints (if any) along the interval . The set of admissible trajectories is defined as
Typically, the functions are required to satisfy conditions at their end-points. Problems of the calculus of variations having end-point constraints only, are often referred to as free problems of the calculus of variations. A great deal of boundary conditions is of interest. The simplest one is to enforce both end-points fixed, e.g., and Then, the set of admissible trajectories can be defined as:
In this case, we may say that we seek for trajectories joining the fixed points and . Alternatively, we may require that the trajectory join a fixed point to a specified curve . Because the final time is now free, not only the optimal trajectory shall be determined, but also the optimal value of . That is, the set of admissible trajectories is now defined as
Besides bound constraints, another type of constraints is often required:
for lagrangian functionals with lagrangian functions . These constraints are often referred to as isoperimetric constraints. Similar constraints with sign are sometimes called comparison functionals. Finally, further restriction may be necessary in practice, such as requiring that for all or part of the optimization horizon . More generally, constraints of the form
for in some interval are called constraints of the Lagrangian form, or path constraints. A discussion of problems having path constraints is deferred until the following chapter on optimal control. We now give a series of examples of classical problems of the Calculus of Variations whose admissible function spaces present some of the constraints discussed previously. The most famous and maybe first problem of the calculus of variations is the Brachistochrone Problem below.
Example 3.1 (Brachistochrone Problem). Consider the problem of finding the curve in the vertical plane joining given points and and such that a material point sliding along without friction from to , under gravity and with initial speed reaches in a minimal time (see Figure 3.1 ). This problem was first formulated then solved by Johann Bernoulli, more than 300 years ago!
The objective function is the time required for the point to travel from to along the curve
where denotes the arc length of defined by and the velocity along . Since the point is sliding along without friction, energy is conserved,
with being the mass of the point, and , the gravity acceleration. That is, and
The Brachistochrone problem thus formulates as a Lagrange problem of the calculus of variations.
We note that this problem is a free problem since we have assigned endopoints and were given. Another classical problem of the calculus of variations with Lagrange cost functional is the catenary problem.
Example 3.2 (Hanging rope with fixed length). Given a rope of length attached at two poles and find the function that describes the shape assumed by the rope under the action of gravity. Using the infinitesimal arc length we can express the constraint enforcing the length of the rope as an isoperimetric constraint. That is:
Therefore the admissible space of functions is a subset of a general linear function space that is defined as:
Since we are seeking for a static equilibrium condition, we look for a function that minimzes the potential energy. In the continuous setting the potential energy of the rope can be expressed as:
Therefore the catenary problem is:
A similar problem to the catenary one is Dido’s isoperimetric problem that, according to Virgil’s Aeneid, dates back to the foundation of Carthago around 850 B.C. Dido’s problem is to find the arc that given the endpoints and the length is able to maximize the area.
Having defined an objective functional and the set of admissible trajectories , one must then decide about the class of functions with respect to which the optimization shall be performed. The traditional choice in the calculus of variations is to consider the class of continuously differentiable functions, e.g., . Yet, as shall be seen later on, an optimal solution may not exist in this class. In response to this, a more general class of functions shall be considered, such as the class of piecewise continuously differentiable functions.
At this point, we need to define what is meant by a minimum (or a maximum) of on . Similar to finite-dimensional optimization problems we shall say that assumes its minimum value at provided that
This assignment is global in nature and may be made without consideration of a norm (or, more generally, a distance). Yet, the specification of a norm permits an analogous description of the local behavior of at a point . In particular, is said to be a local minimum for in , relative to the norm , if
with . Unlike finite-dimensional linear spaces, different norms are not necessarily equivalent in infinite-dimensional linear spaces, in the sense that may be a local minimum with respect to one norm but not with respect to another.
Having chosen the class of functions of interest as several norms can be used. Maybe the most natural choice for a norm on is
since endowed with is a Banach space. Another choice is to endow with the maximum norm of continuous functions:
The maximum norms, and are called the strong norm and the weak norm, respectively. Similarly, we shall endow with the strong norm and the weak norm :
where stands for any norm in . The strong and weak norms lead to the following definitions for a local minimum:
Definition 3.3: Strong Local Minimum, Weak Local Minimum
is said to be a strong local minimum for in if
Likewise, is said to be a weak local minimum for in if
Remark 3.2
Note that for strong (local) minima, we completely disregard the values of the derivatives of the comparison elements . That is, a neighborhood associated to the topology induced by has many more curves than in the topology induced by . In other words, a weak minimum may not necessarily be a strong minimum. Consider the three curves illustrated in Figure 3.3. and are close in the strong sense that is according to but not in a weak sense. While and are close both according to and . a a Note that actually to take the of we should make it hence smoothing the corners
These important considerations are illustrated in the following example.
Example 3.4 (A Weak Minimum that is Not a Strong One). Consider the problem P to minimize the functional
on . We first show that the function is a weak local minimum for P in the topology induced by . Consider the open ball of radius 1 centered at i.e. For every we have
hence This proves that is a local minimum for P since . In the topology induced by , on the other hand, the admissible trajectories are allowed to take arbitrarily large values . Consider the sequence of functions defined by:
clearly, and for each i.e., . Moreover
meaning that for every there is a such that . Finally
for each . Therefore, the trajectory cannot be a strong local minimum for P.
Prior to deriving conditions that must be satisfied for a function to be a minimum (or a maximum) of a problem of the calculus of variations, one must ask the question whether such solutions actually exist for that problem. In the case of optimization in finite-dimensional Euclidean spaces, it has been shown that a continuous function on a nonempty compact set assumes its minimum (and its maximum) value in that set (see Weierstrass Theorem 2.1). It is possible to extend this result to continuous functionals on a nonempty compact subset of a normed function space. But as attractive as this solution to the problem of establishing the existence of maxima and minima may appear, it is of little help because most of the sets of interest are too ”large” to be compact.
The principal reason is that most of the sets of interest are not bounded with respect to the norms of interest. As just an example, the set
is clearly not compact with respect to the strong norm
as well as the weak norm
for we can construct a sequence of curves in
(e.g., parabolic functions satisfying the boundary conditions) which have maximum values as large as desired. That is, the problem of minimizing, say,
or
on
does not have a solution.
A problem of the calculus of variations that does not have a minimum is addressed in the following:
Example 3.5 ( A Problem with No Minimum ). Consider the problem P to minimize the functional:
Observe first that for any admissible trajectory joining the two end-points, we have
That is,
Now, consider the sequence of functions in defined by Then
so we have Overall, we have thus shown that But since for any in we know with certainty that has no global minimizer on .
In this section, we will derive a set of necessary optimality conditions that a trajectory must satisfy in order to be a candidate solution for a weak local minimum of a cost functional . Note that, if we choose that is we work with once continuously differentiable functions, every strong extremum 2 2 What we mean by ”extremum” will be made clear later on is automatically a weak extremum, as a consequence the set of necessary conditions valid for weak extrema are necessary also for strong extrema. We start considering the basic problem of calculus of variations according to Definition 3.1 in which the admissible set is:
that is we are looking for a minimizing trajectory in the set of once continuously differentiable vector-valued functions that have given values at the endpoints. Recall that this is a free problem of the calculus of variations.
Definition 3.4: Perturbed functions
Given a function we define a family of perturbed functions as:
where is a small real parameter and are functions such that .
Remark 3.3
Note that by choosing sufficiently small we can make and arbitrarily close in the sense of the . We have that:
A pictorial example of perturbed function for the scalar case is given in Figure 3.4
We now give a fundamental definition that can be seen as the generalization of the directional derivative to infinite-dimensional function spaces.
Definition 3.5: First Variation of a Functional, Gâteaux Derivative
Let F be a functional defined on a function space . Then, the first variation of at in the direction also called Gâteaux derivative with respect to at , is defined as:
(provided it exists). If the limit exists for all , then is said to be Gâteaux differentiable at
Remark 3.4
Note that for fixed and the functional is actually a real-valued function of that is . Then the Gâteaux derivative is simply the first derivative of the function with respect to computed in , i.e.
Remark 3.5
Note that the Gâteaux derivative need not exist in any direction , or it may exist is some directions and not in others. Its existence presupposes that:
Then:
is defined only if this ”ordinary” derivative with respect to the real variable exists at . If the Gâteaux derivative exists, then it is unique.
Example 3.6 (Calculation of a Gâteaux Derivative). Consider the functional . In order to take the Gâteaux derivative we can define and then directly differentiate it. That is:
on the other hand, we can directly take the limit:
Example 3.7 (Non-Existence of a Gâteaux Derivative). Consider the functional . Clearly, is defined on all for each continuous function results in a continuous integrand whose integral is finite. For and we have:
therefore:
and a Gâteaux derivative does not exist at in the direction .
Lemma 3.1: Descent Direction
Let be a functional defined on a normed linear space (e.g a function space). Suppose that has a strictly negative variation at a point in some direction . Then, cannot be a local minimum point for (in the sense of the norm )
Proof. Since
hence,
since as the points are eventually in each neighborhood of , irrespective of the norm considered on . Thus, local minimum behavior of , in the sense of Definition 3.3 is not possible in the direction at . □
Definition 3.6: -Admissible Directions
Let be a functional defined on a subset of a linear space and let . Then, a direction , is said to be -admissible at for if:
Remark 3.6
Note that if is a -admissible direction then also the parametrized family of directions for all is a -admissible direction. This simple fact follows directly from the definition of -admissible directions.
Remark 3.7: Properties of First Varations
It is instrumental to highlight some properties of the first variation of a functional:
Linearity. Let
and
be two functionals defined on
and
a
-admissible direction. Consider a linear combination of the two functionals
a
a
Indeed
is still a functional. Then
. This linearity property is inherited by the linearity property of the ordinary derivative.
Homogeneity. Let
be a functional defined on
and
a family of
-admissible directions, then
. In other words, the first variation is a homogeneous operator.
Note also that in general the first variation is not a linear operator in that is for and -admissible directions although it may be true in some special cases. We will require linear Gâteaux derivatives to prove some results of inequality constrained problems.
Theorem 3.1: Geometric Necessary Conditions for a Local Minimum
Let be a functional defined on a subset of a normed linear space . Suppose that is a local minimum point for on . Then:
Proof. By contradiction, suppose that there exists a -admissible direction such that . Then, by Lemma 3.1, cannot be a local minimum for . Likewise, there cannot be a -admissible direction such that . Indeed, being -admissible and by Lemma 3.1, cannot be a local minimum. Overall, we must have that for each -admissible direction at . □
Remark 3.8
Note that Theorem 3.1 can be interpreted as the usual necessary condition for optimality for unconstrained NLP problems. Let’s define . Note that the scalar function is the value of the functional when perturbed around a minimizing trajectory that is we are assuming that is a local minimum. Equivalently, we are stating that is a local minimum for . Therefore a necessary condition for to be a local minimum is . But for the definition of we also have:
where the last statement must hold for all admissible directions .
We now state some important results needed for the derivations of the necessary conditions of optimality.
Lemma 3.2
Let and Suppose that:
then
Proof. By contradiction suppose there is a such that , say a a the same reasoning for is analogous and is left to the reader as an exercise. By continuity of there exist a ball around such that is always positive, formally . Since is arbitrary we can choose a function that is everywhere but in where we choose it to be a positive continuous and differentiable function then, for such a , we have:
since both and are positive for and thus we reach a contradiction. Hence, . □
Remark 3.9
Note that with simple modifications Lemma 3.2 to the case:
Where and . We recover the case proved by the Lemma 3.2 by rewriting the previous equation as:
Then it is sufficient to consider functions sucht that and apply the results of Lemma 3.2 for each equation . In this way we obtain that the vector-valued function for each .
We state another standard instrumental theorem without proof.
Theorem 3.2: Differentiation Under the Integral Sign
Let such that be a continuous function with continuous partial derivative on Then
is in with the derivative
Remark 3.9 and Theorem 3.2 are used to obtain a set of necessary conditions for optimality known as Euler equations.
Theorem 3.3: Euler’s Necessary Conditions for Optimality
Consider the problem to minimize the functional:
where the functional space is defined as and a continuously differentiable function. Suppose that gives a (local) minimum for on . Then satisfies the Euler equations:
Proof. We start by computing the first variation of the functional at the minimizing trajectory in a -admissible direction :
By Theorem 3.1 we must have:
By Definition 3.6, -admissible directions are functions such that . Note that in this way we have . We modify the expression of the first variation in order to apply Lemma 3.2. Using integration by parts we have:
but since is a -admissible direction we have:
The necessary condition for optimality then becomes:
The last equation satsifies the conditions of Lemma 3.2, therefore we must have:
Definition 3.7: Stationary Function
Each function that satisfies the Euler equations on some interval is called a stationary function for the Lagrangian .
Remark 3.10
Note that the Euler equations are necessary conditions for optimality therefore a stationary point can be a minimum, a maximum or none of them. In the literature stationary points are alse called extrema or extremal trajectories.
Example 3.8 (Simplest calculus of variations problem). Given two points in the plane and , find the curve of minimum length that joins them. A schematic picture is drawn in Figure 3.5 . Obviously, the solution to this problem is a line joining and but we will work out the solution using the Euler equation. The functional that we want to minimize is the length of the curve, that is we want to solve the problem:
note that with respect to the previous notation is the independent variable and we are seeking for a function . We have also indicated . The lagrangian for this problem is . Note that it is independent of . The Euler equation for this problem reads:
As a consequence:
Since the denominator is always different from zero we have , therefore stationary functions have zero second-derivative. We can then integrate this condition to derive :
Finally imposing that , that is imposing the boundary conditions and we obtain:
Remark 3.11
Depending on the structure of the lagrangian function
, Euler equations can be simplified.
f does not explicitly depend on the function
.
Suppose
, then the Euler equations becomes
.
f does not explicitly depend on the function
.
Suppose
then the Euler equations becomes
. This is a degenerate case.
f does not explicitly depend on the independent variable
.
Suppose
, we define the Hamiltonian function
. If
is a stationary point then the Hamiltonian is constant. The total time-derivative of the Hamiltonian is:
Therefore an equivalent necessary condition is . Where C is a constant. This is also known as Beltrami Identity.
Example 3.9 (A more challenging problem: The Brachistochrone). Recall the Brachistochrone Problem 3.1 :
For the sake of simplicity, we consider a slightly simplified problem. Assume that the initial velocity and the initial point is the origin, that is . The lagrangian for this problem is . Note that the lagrangian is independent of , hence the Hamiltonian is stationary. Therefore, we have:
Which is equivalent to:
for a new constant . Thus we have:
hat is equivalent to:
In order to solve the previous integral we make the substitution , therefore The integral becomes:
where is a constant to be determined. Overall we have obtained a curve in the parameter that is:
This is the parametric form of a family of cycloids depending on the constants (that in turn depends on ) and . The first boundary condition allows to easily find in terms of . That is:
Thus, we obtain a smaller family of cycloids passing through the origin as shown in Figure 3.6 whose equations are:
note that these are alle minimum time curves. The second boundary condition is somewhat more involved and is solved numerically, the nonlinear system is:
The solution joining points and is shown in red in Figure 3.6 . The constant .
Remark 3.12: Hamilton’s Interpretation of Lagrangian Mechanics
According to the principle of least action, a system in motion will always follow the trajectory that minimize the action functional:
where are the generalized lagrangian coordinates of an degrees of freedom system. is the kinetic energy while is the potential energy. The Euler necessary condition reads:
which is the usual Lagrange equation of analytical mechanics. In view of this mechanical interpretation recall the definition of Hamiltonian . In general the kinetic energy have the form . In this case the Hamiltonian reads:
where is the total energy of the system. According to Remark 3.11 if the lagrangian is independent of time the Hamiltonian is constant along an extremal trajectory. In other words the total energy of the system is conserved. Finally, recall that the momentum of a mechanical system can be defined as , hence the special case when the lagrangian is independent of in Remark 3.11 can be viewed as a momentum conservation that is .
Analogously to NLP problems, also for problems of the calculus of variations we can derive a second-order necessary condition. Recall that in the finite-dimensional NLP case the second-order necessary condition was the semi-positive definiteness of the Hessian matrix that is equivalent to the objective function being locally convex in the neighbourhood of a stationary point. Similar conditions hold in the infinite-dimensional case. First, we need the following definition:
Definition 3.8: Second Variation of a Functional
Let F be a functional defined on a function space . Then, the second variation of at in the direction is defined as:
(provided it exists).
Theorem 3.4: Second-Order Necessary Conditions (Legendre)
Consider the problem to minimize the functional:
where the functional space is defined as such that and a continuously differentiable function. Suppose that gives a (local) minimum for on . Then satisfies the Euler equation along with the so-called Legendre condition:
Proof. Based on the differentiability properties of Theorem 3.2 we can compute the second variation as:
where we have used the compressed notation . Substituing gives:
Define the real-valued function of , . We have that satisfies the necessary conditions for a local minimum of an unconstrained optimization problem, since is a local minimum for the functional . Therefore and . Note that . Hence we have:
The last equation must hold true for each perturbation such that . We use Einstein’s notation to imply the double summation, that is:
to simplify the notation let’s define . Note that due to Schwarz’s theorem is symmetric. Using integration by parts we have:
then, using the symmetry of A and the boundary conditions on , we have:
hence we have:
The necessary condition on the second variation becomes:
by Lemma 3.3 a necessary condition on to be nonnegative is:
Lemma 3.3
Let and be given continuous symmetric matrix functions on and let the quadratic functional:
be defined for all such that . Then, a necessary condition for the quadratic functional to be nonnegative for all such is that for each .
It would be tempting to extend the analogy with NLP problems and looking for a sufficient condition. Recall that a sufficient condition for a local minimum of an unconstrained NLP was the stationarity and the strict convexity of the Hessian at the candidate solution, see Theorem 2.8. Unfortunately, the requirements that shall be a stationary function for and the lagrangian 4 strictly convex with respect to the third argument are not sufficient for to be a (weak) local minimum of a free problem of the the calculus of variations. Additional conditions must hold, such as the absence of points conjugate to the point in , such condition is called Jacobi’s sufficient condition. We remind the reader to the vast literature about this topic. Instead we give a sufficient condition for a global minimum that relies on the lagrangian being jointly convex in the second and third argument , respectively. Unfortunately this condition is seldom satisfied in practice. Recall that for a continuously differentiable function , joint convexity means:
Theorem 3.5: Sufficient Conditions for a Weak Global Minimum
Consider the problem to minimize the functional:
where the functional space is defined as such that and a continuously differentiable function. Suppose that the Lagrangian is [strictly] jointly convex in . If is a stationary function for the Lagrangian a a It satifies the Euler necessary condtions then is also a [strict] global minimizer for on .
Proof.
where we have used joint convexity of in the second and third argument. Subsequenty, we make use of integration by parts and the fact that and are equal at the boundary, that is and together with the hypothesis that is a stationary point. Overall we have:
which is the definition of global minimum of a functional. □
Example 3.10. Consider the problem P to minimize the functional
on . The lagrangian is convex in and does not depend on , then thanks to Theorem 3.5 every stationary point is a global minimum. The Euler necessary condition for this case reduces to:
therefore by double integration and imposing the boundary condition we obtain:
that is a straight line joining the origin and the point . The extremal function is the unique global minimum thanks to Theorem 3.5 since it is the unique stationary point of the functional to be minimized.
We now consider slightly different and more involved problems. Consider the problem in the following definition:
Definition 3.9: Calculus of Variation Free End-point problem
Consider the Calculus of Variations problem with free end-point and end-time:
Remark 3.13
The cost functional in Definition 3.9 represent a Bolza problem. Note that the neigher the end-time at nor the end-point are specified and our search space of functions is in some sense larger. Our search space is defined as for sufficiently large so that it can include . We need to derive additional conditions that an extremal must satisfy. These conditions should allow us to find an equation for the end-point and the end-time at which this end-point is reached. Informally, we are then looking for additional equations to derive. equations are needed for while one equation for the final time .
We give without proof an important theorem that is instrumental to derive optimality conditions for this problem.
Theorem 3.6: of Leibniz
Let such that be a continuous function with continuous partial derivative on and . Then
Theorem 3.7: Necessary conditions for free end-point and end-time problems
Consider the problem in Definition 3.9. If is a local minimum point then must satisfy:
Proof. Recall that the geometric necessary condition in Theorem 3.1 does not assume any specific form for the functional . Hence, a necessary condition for a local minimum of a free end-point problem is again for all -admissible direction . Note that in order to be a -admissible direction for this problem must be such that . We highlight again the fact that is not given and that must not satisfy any boundary condition at . Since is not fixed we need to allow variations of the end-time around the optimal time , thus we consider end-time variations of the form . It is essential to notice that the modulating parameter is the same for variation of the end-time and of the trajectory . With this assumption we do not loose generality since and are arbitrary. The first variation of the Bolza-type functional is:
we carry out the derivative term by term. Note that in the first term we can apply Theorem 3.6. Thus we have:
again we have used to compressed notation . Using integration by parts and the boundary condition we have:
while the second term in the expression of the first variation is:
the geometric necessary condition for optimality is therefore:
since the necessary condition must hold along any -admissible direction and final endpoint variation , the underlying rationale is to choose variations that allow us to derive a set of necessary conditions in the form of differential equations or boundary conditions. First we consider a family of variations such that and such that also at the second endpoint holds . Note that these variations are indeed -admissible. Therefore we have:
Note that the previous equation is exactly the equation from which we derived the Euler necessary condition. We conclude that Euler equation is a necessary condition and must hold at a stationary point also in this case.
Note that is yet to be determined. Now we consider a second family of -admissible directions such that but is free to vary. Since Euler equation is satisfied we have:
we can conclude that:
this is a set of boundary conditions at that implicitely defines . These conditions are called transversality conditions. Finally we use a family of variations that nullifies the term , hence variations that satisfies while is left free to vary. Again, Euler equation is satisfied and thus we have:
Therefore we have:
That is a scalar equation that implicitely defines . Note that the first two terms are precisely the definition of the Hamiltonian evaluated at the final optimal instant. The condition can be rewritten as
Remark 3.14
A similar reasoning holds for more general Bolza problems where the function depends also on the initial time, that is . In this case, neither the initial point nor the initial time are fixed and thence the -admissible directions need not to satisfy . In this case, the transversality condition and the end-point condition of the Hamiltonian must hold also at the initial time.
Remark 3.15
The free end-point problem of Lagrange ( i.e. when ) can be handled in the same way, applying the necessary conditions of Theorem 3.7 yields the so called natural boundary conditions:
these boundary conditions must hold at a stationary point together with the Euler equations. Note that if is fixed while is left free only the second equation must hold and vice versa.
Example 3.11. Consider the problem to minimize the functional:
in which the final time is fixed but not the endpoint . is a scalar weight. In this case is a strongly convex nonnegative function that penalizes the distance of the final endpoint from the given target . We highlight again the fact that is not given and it must be found using the transversality condition. The term under the integral sign is a strongly convex function that penalizes high value of the derivative of . Informally, given an initial point we want to find the curve that reaches a trade-off between having a small derivative along its trajectory and approaching as close as possible the target . Obviously the relative weight of these objectives depend on the magnitude of . For the sake of simplicity, let’s consider . The Euler necessary condition reads:
note that we can only assign the initial boundary condition . The transversality condition reads:
Therefore the extremal trajectory is:
note that as we have that is in the limit of an infinite weight . This would be the solution of the Lagrange problem with the same lagrangian and and the endpoint fixed. On the other hand, for a vanishing weight (i.e. we have that tends to a constant value throughout the trajectory. Indeed a constant trajectory is the global minimum 5 for the functional where only the initial point is fixed. The extremal trajectories for a finite are straight lines in the interval that reach the optimal trade-off between minimizing the value of the derivative and reaching the target. A simple solution with and , is shown in Figure 3.7 . In this simple case we can also compute analytically the optimal value of the cost functional as a function of the weight . Since and , we have:
In all the problems examined so far, the functions defining the class for optimization were required to be continuously differentiable that is . Yet, it is natural to wonder whether cornered trajectories, i.e., trajectories represented by piecewise continuously differentiable functions, might not yield improved results. Besides improvement, it is also natural to wonder whether those problems of the calculus of variations which do not have extremals in the class of functions actually have extremals in the larger class of piecewise functions.
Definition 3.10: Piecewise functions
A real-valued function is said to be piecewise , denoted , if there is a finite (irreducible) partition such that may be regarded as a function in for each When present, the interior points are called corner points of .
Some remarks are in order. Observe first that, when there are no corners, then . Further, for any is defined and continuous on except at its corner points where it has distinct limiting values ; such discontinuities are said to be simple, and is said to be piecewise continuous on denoted . Figure 3.8 illustrates the effect of the discontinuities of in producing corners on the graph of . Without these corners, might resemble the function whose graph is presented for comparison. In particular, each piecewise function is ”almost” , in the sense that it is only necessary to round out the corners to produce the graph of a function. These considerations are formalized by the following Lemma:
Lemma 3.4: Smoothing of Piecewise Functions
Let . Then, for each there exists such that except in a neighborhood of each corner point of . Moreover, where is a constant determined by .
Likewise, we shall consider the class of -dimensional vector valued analogue of consisting of those functions with components The corners of such are by definition those of any one of its components Note that the above lemma can be applied to each component of a given and shows that can be approximated by a which agrees with it except in prescribed neighborhoods of its corner points. Both real valued and real vector valued classes of piecewise functions form linear spaces of which the subsets of functions are subspaces. Indeed, it is obvious that the constant multiple of one of these functions is another of the same kind, and the sum of two such functions exhibits the piecewise continuous differentiability with respect to a suitable partition of the underlying interval . Since , we have:
defines a norm on Moreover,
can be shown to be another norm on with being a suitable partition for . (The space of vector valued piecewise functions By analogy to the linear space of functions, the maximum norms and are referred to as the strong norm and the weak norm, respectively; the functions which are locally extremal with respect to the former [latter] norm are said to be strong [weak] extremal functions.
A natural question that arises when considering the class of functions is whether a (local) extremal point for a functional in the class of functions also gives a (local) extremal point for this functional in the larger class of functions. We state the following Theorem:
Theorem 3.8: Extremals vs. Extremals
If gives a [local] extremal point for the functional:
on
with
then
also gives a [local] extremal point for
on
with respect to the same
or
norm.
On the other hand, a functional may not have extremals, but be extremized by a function. We shall first seek for weak (local) extremals i.e., extremal trajectories with respect to some weak neighborhood .
on where and its partials are continuous on one can therefore duplicate the analysis of the previous section. This is done by splitting the integral into a finite sum of integrals with continuously differentiable integrands, then differentiating each under the integral sign. Overall, it can be shown that a (weak, local) extremal must be stationary in intervals excluding corner points, i.e. the Euler equation is satisfied on except at corner points of .
Besides necessary conditions of optimality on intervals excluding corner points of a local extremal the discontinuities of which are permitted at each are restricted. These are the so-called first Weierstrass-Erdmann corner conditions.
Theorem 3.9: First Weierstrass-Erdmann Corner Condition
Let be a (weak) local extremal of the problem to minimize the functional
on where and its partials are continuous on . Then, at every (possible) corner point of we have:
where and denote the left and right time derivative of at respectively.
Proof. Integrating both sides of Euler equation between and for each component gives:
therefore the function is continuous at each even though may be discontinuous at that point. That is . Moreover, being continuous in its arguments, being continuous at , and having finite limits at we get:
for each □
Remark 3.16
The first Weierstrass-Erdmann condition of Theorem 3.9 shows that the discontinuities of which are permitted at corner points of a local extremal trajectory are those which preserve the continuity of . Likewise, it can be shown that the continuity of the Hamiltonian must be preserved at corner points of that is:
which yields the so-called second Weierstrass-Erdmann corner condition.
Example 3.12. Consider the problem to minimize the functional:
where . The lagrangian is independent of the independent variable , we thus have the constancy of the Hamiltonian on a stationary trajectory. Thus:
for some constant . Upon semplification, we get:
in order to solve this differential equation we make the substitution and thence , we get the somewhat simpler equation that can be solved by separation of variable and has general solution:
where is a constant of integration. In turn, substituing back we conclude that a stationary point must be of the form:
In particular, the boundary conditions and produce constants and . However, the resulting stationary function:
is defined only for or . Thus, there is no stationary function for the Lagrangian in . Next, we turn to the problem of minimizing in the larger set . Suppose that is a local minimizer for on . Then, by the Weierstrass-Erdmann condition, we must have:
which gives:
by definition of corner points, we must have , hence corner points are only allowed at those such that . Observe that since and thus the functional is bounded below by . This means that if we are able to find a function that satisfies the necessary conditions and achieve then is the global optimum of the problem. From the boundary condition we have that and and from the Weierstrass-Erdmann condition we have that we can only have corner points at time instants such that . We can then construct a function that is from to where it has a corner point and grows linearly with constant derivative equal to up to so that . More precisely:
such a function achieves the global minimum and satisfies the necessary conditions (i.e. it has a corner point at where ) and thence it is the unique global minimum point for on . The global optimum is plotted in Figure 3.9 .
Corollary 3.1: Absence of corner points
Consider the problem to minimize the functional:
on . If is a strictly monotone function of (or, equivalently, is a convex function in on ), for each , then an extremal solution cannot have corner points.
Proof. By the first Weierstrass-Erdmann corner condition, at a corner point holds:
Let’s define vector-valued function . By hypothesis is strictly monotone in and thus it cannot assume twice the same value. The Weierstrass-Erdmann condition can be written in terms of as:
but by definition of corner point thus contradicting the strictly monotonicity of . □
Example 3.13 (Minimum Path Problems). Consider again the problem in Example 3.8 that is to minimize the distance between two fixed points, namely and in the -plane. We have shown that extremal trajectories for this problem correspond to straight lines. But could we have extremal trajectories with corner points? The answer is no, the lagrangian is and is a strictly monotone function in hence we can apply Corollary 3.1 and conclude that extremal trajectories for this problem cannot have corner points.
The Gâteaux derivatives of a functional are obtained by comparing its value at a point with those at points in a weak norm neighborhood. In contrast to these (weak) variations, we now consider a new type of (strong) variations whose smallness does not imply that of their derivatives. In the scalar case 6 , we consider variations defined as:
Where and and are positive real coefficients such that and .
Note that strong variations of this kind depend on three parameters . Informally speaking is the point at which the perturbation is centered while determines the extension in time of the perturbation. Note that the conditions and constrain the variation to lie within the open interval . The parameter modulates the magnitude of the variation and of its derivative. We are now ready to state a set of necessary conditions for a strong local minimum, whose proof is based on the foregoing class of variations.
Theorem 3.10: Weierstrass’ Necessary Condition
Consider the problem to minimize the functional:
on Suppose gives a strong (local) minimum for on . Then,
at every and for each . ( is referred to as the excess function of Weierstrass ).
Proof. For the sake of clarity, we shall present and prove this condition for scalar functions only. Let . Note that both and being functions, so is . These smoothness conditions are sufficient to calculate , as well as its derivative with respect to at . Note that and differ only in the interval . Hence, by the definition of , we have:
note that we have splitted the integral in the points of discontinuities of the derivative of . The differential quotient is:
where we have applied Theorem 3.6. In order to analyse the second term we define , its time-derivative is and thus we have:
using the first-order Taylor series expansion in under the integral sign we have:
where the arguments have been compressed for notational simplicity. Therefore we have:
upon integration by parts of the term involving we obtain:
note that the term since Euler equations are necessary condition for optimality. Then by definition of small we have , finally the last term reduces to:
since and . Since is a local strong minimizer we have , for all sufficiently small , and thence also in the limit
and thus:
for every and . □
Example 3.14 (Minimum path problem II). Consider again the problem in Example 3.8 that is to minimize the distance between two fixed points, namely and in the -plane. We have shown that extremal trajectories for this problem correspond to straight lines joining and , that is:
where and . We now ask the question whether is a strong minimum for that problem? The Weierstress excess function is:
note that in this simple case it can be regarded as a function of and only. The excess function is plotted as a function of for different values of in Figure 3.11 . It is easily checked that it is always nonnegative therefore is also a strong local minimum for the minimum path problem. In fact, we note that the function is convex 7 . The Weierstrass condition is equivalent to:
That holds true for every being precisely the first-order characterization of convexity of .
The following corollary indicates that the Weierstrass condition is also useful to detect strong (local) minima in the class of functions.
Corollary 3.2: Weierstrass’ Necessary Condition
Consider the problem to minimize the functional:
where the functional space is defined as
and
a continuously differentiable function. Suppose that
gives a (local) minimum for
on
.
Then :
at every and for each .
Remark 3.17: Weierstrass’ Condition and Convexity
It is readily seen that the Weierstrass condition of Theorem 3.10 is satisfied automatically when the Lagrangian function is partially convex (and continuously differentiable) in , for each .
Remark 3.18: Weierstrass’ Condition and Pontryagin’s Maximum Principle
Interestingly enough, the Weierstrass’ condition can be rewritten as:
which given the definition of Hamiltonian gives:
for each and for each . This necessary condition prefigures Pontryagin’s Maximum Principle in optimal control theory.
We now turn our attention to problems of the calculus of variations in the presence of constraints. Similar to finite-dimensional optimization problems, it is more convenient to use Lagrange multipliers in order to derive the necessary conditions of optimality associated with such problems; these considerations are discussed in the following for equality constraints. The method of Lagrange multipliers is then used in to obtain necessary conditions of optimality for problems subject to (equality) terminal constraints and isoperimetric constraints, respectively. The Lagrange multiplier methodology is then extended to inequality constrained problems.
We start by considering a general class of equality constraints where the function space is defined by level-set curves of one ore more functionals .
Definition 3.11: Equality constrained calculus of variations problem
Consider the problem to minimize:
where .
The functionals are defined in as well.
Remark 3.19
Note that also the basic problem of Calculus of Variations we have addressed so far falls into this broader category. Consider the functionals and and recall that for the basic problem we had . We can define the same as .
Remark 3.20: Common types of constraint functionals
This kind of problems allow to treat more general problems such as endpoint constrained problems and isoperimetric problems. In the endpoint constrained case the constraint functionals take the form for . The constraint reduces to , in this way the final state is forced to satisfy . Isoperimetric problems are a class of problems where the constraint functionals are defined as and the constraint is expressed as the level-set where . These kind of constraints are found in minimum volume [area] problems with fixed area [perimeter] as for example the catenary or Dido’s problem that we will solve in the following sections.
The next theorem that we give without proof is instrumental to prove a fundamental lemma on the existence of minimizers of such equality constrained problems.
Theorem 3.11: Inverse Function Theorem
Let and If a function has continuous first partial derivatives in each component with nonvanishing Jacobian determinant at then provides a continuously invertible mapping between and a region containing a full neighborhood of .
The following Lemma gives conditions under which a point in a normed linear space cannot be a (local) extremal of a functional , when constrained to the level set of another functional .
Lemma 3.5
Let and be functionals defined in a neighborhood of in a normed linear space , and let be the equality constraint defining . Suppose that there exist fixed directions such that the Gâteaux derivatives of and satisfy the Jacobian condition:
and are continuous in a neighborhood of (in each direction , ). Then, cannot be a local extremal for when constrained to .
Proof. Consider the auxiliary functions and , we have and and by definition of Gâteaux derivative , , and . Let us also define the function as:
the jacobian determinant of at is:
by hypothesis we have and thence Theorem 3.11 is applicable. Therefore the pre-image points of the neighborhood of maps a full neighbourhood of the origin in the plane as shown in Figure 3.12. That means that we can find and such that:
by definition of the functions and we have:
that means that is admissible (i.e. satisfy the constraint ) and reduces the cost, that is hence cannot be a local extremal trajectory. □
With this preparation, it is easy to derive necessary conditions for a local extremal in the presence of equality or inequality constraints, and in particular the existence of the Lagrange multipliers.
Theorem 3.12: Existence of the Lagrange Multipliers
Let and be functionals defined in a neighborhood of in a normed linear space and having continuous Gâteaux derivatives in this neighborhood. Let also and suppose that is a (local) extremal for constrained to . Suppose further that for some direction . Then, there exists a scalar such that:
Proof. Since is a (local) extremal for constrained to by Lemma 3.5 we must have that the determinant:
for any Hence, by defining:
it follows that for each . □
Remark 3.21
As in the finite-dimensional case, the parameter in Theorem 3.12 is called a Lagrange multiplier. Using the terminology of directional derivatives appropriate to , the Lagrange condition says simply that the directional derivatives of are proportional to those of at . Thus, in general, Lagrange’s condition means that the level sets of both and at share the same tangent hyperplane at i.e., they meet tangentially. Note also that the Lagrange’s condition can also be written in the form
This is due to the linearity of the Gâteaux derivative that is, given two scalars and two functionals from to we have that as shown in Remark 3.7 which suggests, as we will see in the following, consideration of the Lagrangian functional .
It is possible, ableit technical, to extend the method of Lagrange multipliers to problems involving any finite number of constraint functionals:
Theorem 3.13: (Existence of the Lagrange Multipliers (Multiple Constraints)
Let and be functionals defined in a neighborhood of in a normed linear space and having continuous Gâteaux derivatives in this neighborhood. Let also and suppose that is a (local) extremal for constrained to . Suppose further that:
for (independent) directions . Then, there exists a constant vector such that:
Remark 3.22: Link to Nonlinear Optimization
The previous theorem is the generalization of Theorem 2.13 of Chapter 2 to optimization problems in normed linear spaces. Note, in particular, that the requirement that to be a regular point for the Lagrange multipliers to exist is generalized by a non-singularity condition in terms of the Gâteaux derivatives of the constraint functionals . Yet, this condition is in general not sufficient to guarantee uniqueness of the Lagrange multipliers.
Remark 3.23: Hybrid Method of Admissible Directions and Lagrange Multipliers
If with a subset of a normed linear space and the -admissible directions form a linear subspace of (i.e., for every scalars ), then the conclusions of Theorem 3.12 remain valid when further restricting the continuity requirement for to and considering -admissible directions only. Said differently, Theorem 3.12 can be applied to determine (local) extremals to the functional constrained to . This extension leads to a more efficient but admittedly hybrid approach to certain problems involving multiple constraints. Those constraints on which determine a domain having a linear subspace of -admissible directions, may be taken into account by simply restricting the set of admissible directions when applying the method of Lagrangian multipliers to the remaining constraints.
Necessary conditions of optimality for problems with end-point constraints and isoperimetric constraints shall be obtained with this hybrid approach in the next subsections.
So far, we have only considered those problems with either free or fixed end-time , and end-points , . In this subsection, we shall consider problems having end-point constraints of the form with being specified or not. In the case is free, shall be considered as an additional variable in the optimization problem. Like in 3.3.1 we shall then define the functions by extension on a ”sufficiently” large interval , and consider the linear space , supplied with the weak norm . In particular, Theorem 3.13 applies readily by specializing the normed linear space to and considering the Gâteaux derivative at in the direction . These considerations yield necessary conditions of optimality for problems with end-point constraints as given in the following:
Theorem 3.14: Transversal Conditions
Consider the problem to minimize the functional
on with and . Suppose that gives a (local) minimum for on and at . Then, is a solution to the Euler equation of Theorem 3.3 satisfying both the end-point constraints and the transversal condition:
in the particular one-dimensional case (i.e. ) the transversal condition reduces to:
Proof. Observe first that by fixing and varying in the -admissible direction such that we show as in the proof of Theorem 3.3 that must be a solution to the Euler equation on . Observe also that the right end-point constraints may be expressed as the zero-level set of the functionals:
then simplifying the terms using the Euler equation as in the proof of Theorem 3.7 we obtain the first variations of the cost and constraint functionals as:
where the usual compressed notation is used. Based on the differentiability assumptions on and it is clear that these Gâteaux derivatives exist and are continuous. Further, since the rank condition holds at one can always find (independent) directions such that the regularity condition:
is satisfied. Now, consider the linear subspace
. Since
gives a (local) minimum for
on
by Theorem 3.13 and Remark 3.23 there exist a vector
such that:
The above equation can be rewritten using the vectorial form as:
that must hold for every . In particular, we consider variations such that so that the second term is simplified and we get:
that must hold for each admissible and thus we get:
Now considering variations such that while is such that we get for each :
and thus we have the equation:
for each . In vectorial notation we have the following dimensional system of equations:
that following our definitions of and becomes:
that is a linear system of equations expressed as where . Note that is the jacobian matrix whose dimensions are and by hypothesis its rank is . Therefore for the rank-nullity theorem , we can therefore pick a dimensional vector and post-multiply by it the previous equation, noting that since , we get:
note that if belongs to the kernel of the Jacobian at also does, then it also holds:
Substituting the definition of and we get:
Remark 3.24: one-dimensional case
For the one-dimensional case we have:
note that reduces to a row vector in then the kernel is simply made of the subspace of vectors in such that:
by picking and the transversality condition reduces to:
Remark 3.25
Note that therefore its Jacobian is a matrix of the form :
and it is a matrix-valued function of that is . Its rank can be at most when all of its rows are independent.
Example 3.15 (Minimum Path Problem with Variable End-Point and End-Time). Consider the problem to minimize the distance between a fixed point and the point in the -plane. We want to find the curve such that the functional is minimized, subject to the constraints and . Note that neither nor are explicitly given. Instead they are implicitly given by the scalar end-point constraint where is obviously a function. We stress the fact that (i.e. the zero level-set of the function ) is a straight line in . Therefore, we are looking for the minimum path problem from a fixed point to a point on a given straight line. For the sake of simplicity let’s consider , from Theorem 3.14 we know that an extremal must satisfy the Euler equation in the interval . Note that is yet unknown. As we have already seen the extremal for minimum path problems are straight lines. Since we have assigned a boundary condition in , we get:
that are a family of straight lines passing through the origin. In order to find and we apply the necessary condition of Theorem 3.14 adpated to the one dimensional case, that is:
Substituting , , , we get:
and upon simplification:
that precisely means that the extremal curve is perpendicular to the straight line defined by the constraint. In this simple case, this is a global condition but the transversality condition in general is a local condition at the end-point between the ”gradient” of the constraint and the extremal trajectory. Note that we have also implicitly found as the intersection between the lines and that is . A graphical representation of this problem is shown in Figure 3.13
An isoperimetric problem of the calculus of variations is a problem wherein one or more constraints involves a functional in the form of an integral over part or all of the integration horizon . Typically:
where is a given number. Note that this problem already fits the framework that we have treated so far since the constraint is in the form of a level-set of the functional . We will use the hybrid approach in Remark 3.23 to deal both with simpler constraints imposed by the function space (e.g. fixed end-points) and with the isoperimetric constraint .
Theorem 3.15: First-Order Necessary Conditions for Isoperimetric Problems
Consider the problem to minimize the functional:
on subject to the isoperimetric constraints:
with and Suppose that gives a (local) minimum for this problem, and
for (independent) directions . Then, there exists a constant vector such that is a solution to the so called Euler-Lagrange’s equation:
where:
Proof. Remark first that, from the differentiability assumptions on and the Gâteaux derivatives and , , exist and are continuous for every .
Since gives a (local) minimum for on constrained to , and is a regular point for the constraints, by Theorem 3.13 (and Remark 3.23 ), there exists a constant vector such that:
for each -admissible direction . Observe that this latter condition is equivalent to that of finding a minimizer to the functional:
on . The conclusion then directly follows upon applying Theorem 3.3. □
Remark 3.26: First Integrals
Similar to free problems of the calculus of variations (see Remark 3.11) it is easy to show that the Hamiltonian function defined as
is constant along an extremal trajectory provided that does not depend on the independent variable . Recall the definition and that along an extremal trajectory the necessary condition of Theorem 3.15 holds, that is:
The total time derivative of the Hamiltonian is:
Example 3.16 (Solution to Dido’s Problem). We are finally ready to face problem shown in Example 3.3 . Dido’s Problem is to minimize the functional:
subject to the isoperimetric 9 constraint:
The subspace is . Now we build the function as:
Since is independent of , the hamiltonian function is constant along an extremal trajectory, that is:
in this case we have , hence:
Manipulating the previous relation we get:
that is a separable nonlinear differential equation, upon squaring both terms and after some algebraic manipulation we get:
and by formally expressing and integrating both sides we have:
In order to solve the integral in we make the substitution , and hence:
in a similar way as for the brachistochrone problem we get the extremal trajectory in parametric form as:
where we have defined . Note that define a circular arc of radius and centered at . This is easy to see considering that:
and thus:
however we still need to compute the actual value of the radius and the center coordinates . Assume for the sake of simplicity that , and . Substituing the boundary condtions in the previous equation gives:
by subtracting these two conditions we have:
therefore the center of this family of circles is in and passes through and . We still need to compute the radius and the ordinate of the center . Indeed these two parameters depend on the isoperimetric constraint (i.e. the length of the circle) that is given. The length of an arc of circle of radius between and (yet unknown) is , this comes from the definition of radiants, obviously the same relation holds if we compute it analytically as:
and can expressed in terms of by using the boundary condition at and :
Since the arctangent function is an odd function we have while from the cartesian expression of the circle with = 0 at point we get:
finally the expression for the length presents the only unknown :
that, given , can be solved numerically for that is the ordinate of the center of the circle, then using we can retrieve the value of the Lagrange multiplier that, interestingly enough, in this case is also the radius of the circle changed of sign. as a function of is plotted in Figure 3.14 . The extremal for the simple case that gives and is plotted in Figure 3.15 .
Example 3.17 (Problem of the Surface of Revolution of Minimum Area). Consider the problem to find the smooth curve having a fixed length joining two given points and and generating a surface of revolution around the -axis of minimum area. In mathematical terms, the problem consists of finding a minimizer of the functional:
on subject to the isoperimetric: constraint
Let us drop the coefficient in and introduce the Lagrangian as
since does not depend on the independent variable we use the fact that must be constant along an extremal trajectory ,
for some constant That is:
that can be rearranged as:
that with some simplification gives:
again we solve the integral by substituting and and thus:
and by using the relation we obtain:
and assuming we have:
finally substituing back in we obtain:
The constants , and are to be found from the boundary conditions and the isoperimetric constraints. Note that the extremal curve are a family of catenaries. For the sake of simplicity let’s do the the inverse reasoning that is we assign and we compute the boundary condititions. Note that is a translation along the -axis while a translation along the -axis.With , and , we get the results of Figure 3.16 and 3.17 where the length of the curve is:
note that the length does not depend directly on the lagrange multiplier . While the surface area is:
Example 3.18 (Solution to hanging rope with fixed length). We give the solution to the problem shown in Example 3.2 . Given a rope of length attached at two poles and find the function that describes the shape assumed by the rope under the action of gravity. We want to minimize the functional expressing the potential energy under the isoperimetric constraint expressing the fixed length. That is:
on subject to the isoperimetric constraint
This is exactly the same problem as the surface of minimum area. The rope assumes the shape of a catenary:
where again are to be found from the boundary condition and the length .
The method of Lagrange multipliers can also be used to address problems of the calculus of variations having inequality constraints (or mixed equality and inequality constraints), as shown by the following:
Theorem 3.16: Existence and Uniqueness of the Lagrange Multipliers (Inequality Constraints)
Let and be functionals defined in a neighborhood of in a normed linear space and having continuous and linear Gâteaux derivatives in this neighborhood. Suppose that is a (local) minimum point for constrained to for some constant vector . Suppose further that constraints, say for simplicity, are active at and satisfy the regularity condition
for (independent) directions (the remaining constraints being inactive). Then, there exists a vector such that:
and
for
Proof. Since the inequality constraints are inactive, the nonnegativity conditions and complementary slackness are trivially satisfied by taking . On the other hand, since the inequality constraints are active and satisfy a regularity condition at , the conclusion that there exists a unique vector such that the ”stationarity condition” holds follows from Theorem 3.12 moreover, the complementarity slackness condition is trivially satisfied for since if is active. Hence, it suffices to prove that the Lagrange multipliers cannot assume negative values when is a (local) minimum.
We show the result by contradiction. Without loss of generality, suppose that , and consider the matrix A defined by
By hypothesis, , hence the null space of has dimension lower than or equal to . But from the stationarity condition the nonzero vector . That is has dimension equal to and only if such that . Because there does not exist a in such that for every . Thus, by Gordan’s Theorem there exists a nonzero vector in such that , or equivalently:
The Gâteaux derivatives of being linear (by assumption), we get:
In particular,
that is, being a local minimum of on ,
and we get
which contradicts the inequality obtained earlier. □